Skip to content

Low-latency pitch detection (phases 1-4)#29

Merged
user1303836 merged 4 commits intomainfrom
feature/low-latency-pitch
Feb 16, 2026
Merged

Low-latency pitch detection (phases 1-4)#29
user1303836 merged 4 commits intomainfrom
feature/low-latency-pitch

Conversation

@user1303836
Copy link
Owner

@user1303836 user1303836 commented Feb 16, 2026

Summary

Overhaul pitch detection pipeline to reduce latency from ~93ms to ~23ms and improve note-tracking responsiveness.

Phase 1: Ring buffer + sliding window + remove decimation

  • Replaced linear fill-analyze-reset buffer with circular ring buffer and hop-based analysis
  • Removed decimation (was 2x/4x) -- analysis now runs at native sample rate
  • Window size scales with sample rate: halfWindow = ceil(sampleRate / 70.0), ensuring detection down to ~70Hz at any rate
  • Analysis triggers every ceil(sampleRate * 0.003) samples (~3ms) after initial window fill

Phase 2: FFT-accelerated difference function

  • Replaced O(N^2) nested-loop difference function with O(N log N) FFT cross-correlation
  • Uses performRealOnlyForwardTransform on first-half and full-signal separately, then conj(A)*B cross-spectrum and inverse FFT
  • Mathematically exact: computes C(tau) = sum_{j=0}^{n-1} x[j]*x[j+tau] via cross-correlation, not the simpler (but incorrect for low frequencies) full-signal autocorrelation
  • Tests link against juce_dsp via juce_add_console_app

Phase 3: PitchSmoother fast-attack/slow-release

  • Pitch changes larger than ~1 semitone (0.08 in log2) snap immediately (no glide)
  • Small changes within a semitone use configured smoothing for jitter reduction
  • Reduced max tau from 200ms to 50ms (matching ~3ms update interval)

Phase 4: Integration verification

  • Clean rebuild from scratch: all formats build (Standalone, AU, VST3)
  • 31 tests pass (12 YIN, 10 PitchSmoother, 6 Oscillator, 3 Parameter)
  • pluginval passes at strictness level 5 locally (VST3)
  • CI runs macOS + Windows builds with pluginval at strictness 10

Impact

Metric Before After
First pitch estimate ~93ms ~23ms (44.1k), ~29ms (48k)
Update interval ~93ms ~3ms
Difference function O(N^2) O(N log N)
Note change response Slow glide (100ms+ tau) Instant snap (>1 semitone)
Max smoothing tau 200ms 50ms

Files changed

File Phases Change
source/dsp/YinPitchDetector.h 1, 2 Ring buffer, hop, FFT members
source/dsp/YinPitchDetector.cpp 1, 2 Sliding window, FFT cross-correlation
source/dsp/PitchSmoother.h 3 Fast-attack/slow-release, reduced tau
tests/TestYinPitchDetector.cpp 1 Latency + accuracy tests
tests/TestPitchSmoother.cpp 3 Updated + new smoother tests
tests/CMakeLists.txt 2 Link juce_dsp for FFT

Test plan

  • All 31 tests pass (clean rebuild)
  • Plugin builds: Standalone, AU, VST3
  • pluginval VST3 strictness 5 (local)
  • CI: macOS + Windows build, test, pluginval strictness 10
  • Manual: sustained note -- effect engages within perceptually instant timeframe
  • Manual: rapid note changes -- no audible gliding between pitches

Remove decimation (was 2x at 44.1kHz, 4x at 96kHz+), reducing
effective window from 4096 to 1024 samples. Analysis now runs at
native sample rate with hop-based triggering every 128 samples.

First pitch estimate: ~93ms -> ~23ms
Update interval: ~93ms -> ~3ms
Copy link
Owner Author

@user1303836 user1303836 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #29 Review: Phase 1 - Ring buffer + sliding window + remove decimation

The core architectural change is sound -- replacing the fill-reset buffer with a ring buffer + hop-based analysis is the right approach, and the implementation is clean and minimal. The decimation removal and window reduction achieve the latency goals outlined in the plan. Good test coverage for the new latency properties.

Must fix (2 issues)

1. Window size doesn't scale with sample rate -- low-frequency detection regression.

With windowSize hardcoded to 1024 and decimation removed, halfWindow = 512 sets the maximum detectable period. At 44.1kHz, the lowest detectable frequency is ~86Hz, which already excludes E2 (82.4Hz, period ~535 samples). At 96kHz, E2's period is ~1164 samples -- far beyond halfWindow. The old decimated approach handled this implicitly. The window (and hop) size should scale with sample rate to maintain consistent frequency coverage. Alternatively, bump the base window to 1200 as the plan suggests for the 44.1k case, and scale from there.

2. sampleCount integer overflow on long sessions.

At 96kHz, a 32-bit int overflows in ~6.2 hours. After overflow, the detector stops producing results. Replace with a bool windowFilled flag that latches on first fill.

Suggestions (non-blocking, 2 items)

  • Hop interval test is fragile: relies on float jitter between consecutive analyses of the same sine to detect updates. Consider feeding a changing signal instead.
  • Relock test name/bound mismatch: the test name implies fast relock but the bound allows a full window refill. Consider renaming or tightening.

Phase 2 note

The linearBuffer copy in analyse() will become redundant once the FFT buffer is introduced. Not an issue now but worth keeping in mind to avoid double-copying.

Summary

The two must-fix items are the sample-rate-dependent window sizing (correctness regression for low frequencies and high sample rates) and the sampleCount overflow (silent failure after hours of use). Both are straightforward to address. The rest of the implementation is solid and well-aligned with the latency analysis plan.

int writePos = 0;
bool bufferFull = false;
int hopCounter = 0;
int sampleCount = 0;
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sampleCount will overflow on long sessions.

sampleCount is an int that increments every sample and is never reset (except via prepare()). At 96kHz, a 32-bit int overflows in ~6.2 hours. After overflow, the sampleCount >= windowSize guard in feedSample() will fail, and the detector silently stops producing results until the next prepare() call.

This is a real-world scenario -- DAW sessions routinely run for hours.

Two options:

  1. Add a bool windowFilled = false; flag that gets set once sampleCount >= windowSize, then use that in the guard. This is the simplest fix and eliminates the counter entirely after the initial fill.
  2. Cap sampleCount at windowSize so it stops incrementing: if (sampleCount < windowSize) ++sampleCount;

Option 1 is cleaner since sampleCount would only exist to track the initial fill.


// Step 1: Difference function
for (int i = 0; i < windowSize; ++i)
linearBuffer[static_cast<size_t>(i)] = buffer[static_cast<size_t>((writePos + i) % windowSize)];
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ring buffer linearization copies the full window every hop -- consider whether this matters for Phase 2.

Right now this copies 1024 floats every 128 samples, which is fine. But worth noting: when Phase 2 adds FFT-accelerated autocorrelation, you'll need to copy data into the FFT buffer anyway. At that point this linearBuffer copy becomes redundant -- the FFT input buffer serves the same purpose.

Not a blocker for this phase, just flagging so Phase 2 doesn't end up doing two copies.

decimationAccum = 0.0f;
analysisSR = sampleRate / decimation;
analysisSR = sampleRate;

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded windowSize = 1024 ignores sample rate -- E2 detection will fail at 96kHz.

At 44.1kHz, 1024 samples = ~23ms, which covers ~1.9 periods of E2 (82.4Hz). That's tight but workable with parabolic interpolation.

At 96kHz, 1024 samples = ~10.7ms, which covers only ~0.88 periods of E2. The YIN algorithm needs at least 2 periods in the analysis window (halfWindow = 512 samples = ~5.3ms at 96kHz = 0.44 periods). This is fundamentally insufficient -- YIN cannot find a valid tau for E2 because the period (~1164 samples at 96kHz) exceeds halfWindow.

The old code handled this via decimation (effectively analyzing at 24kHz at 96k SR). Since decimation is removed, the window size should scale with sample rate to maintain the same frequency coverage:

windowSize = static_cast<int>(sampleRate / 44100.0 * 1024);

This gives 1024 at 44.1k, 1115 at 48k, 2227 at 96k -- preserving the ~23ms analysis window regardless of sample rate.

The hopSize should probably scale proportionally too to maintain the same ~3ms update interval.

Comment on lines 186 to 196

for (int i = 0; i < 512; ++i)
{
float sample = static_cast<float>(std::sin(phase));
yin.feedSample(sample);
phase += inc;

auto r = yin.getResult();
if (r.frequency != prevFreq)
{
++updateCount;
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "hop interval updates" test has a fragile detection mechanism.

This test checks that r.frequency != prevFreq to count updates. But on a continuous 440Hz sine, the detected frequency may be identical across consecutive analyses (YIN is deterministic on the same signal). The frequency only drifts slightly as the phase alignment of the window shifts with each hop.

The test passes now because floating-point differences in window alignment cause tiny frequency variations. But this is brittle -- a future improvement to detection stability (which is the goal of Phase 3's smoother work) could make consecutive results identical, causing this test to fail.

A more robust approach: feed a signal that actually changes (e.g., sweep from 440 to 880 Hz over 512 samples) and verify the output tracks the change. That way the test genuinely validates that updates are arriving, rather than relying on jitter.

REQUIRE(result.confidence >= 0.0f);
REQUIRE(result.confidence <= 1.0f);
}

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test: pitch accuracy at the low-frequency boundary (E2 ~82Hz).

The latency analysis specifically calls out E2 (82.4Hz) as marginal with a 1024 window at 44.1kHz. With halfWindow = 512, the maximum detectable period is 512 samples, which corresponds to 44100/512 = ~86Hz. E2 at 82.4Hz has a period of ~535 samples -- this exceeds halfWindow and will not be detected.

This is a correctness regression from the old code. The old decimated window (2048 at ~22050 Hz effective SR) had halfWindow = 1024 decimated samples, supporting down to ~21.5Hz.

A test for this would confirm the regression:

TEST_CASE("YIN: detects E2 (82.4 Hz) at 44100 Hz")
{
    YinPitchDetector yin;
    yin.prepare(44100.0);
    feedSine(yin, 44100.0, 82.4f, 44100);
    auto result = yin.getResult();
    REQUIRE(result.frequency > 0.0f);
    REQUIRE_THAT(static_cast<double>(result.frequency),
                 Catch::Matchers::WithinRel(82.4, 0.03));
}

This ties directly to the window size comment on YinPitchDetector.cpp -- the window likely needs to be larger (the plan mentions 1200 as a workable alternative).

REQUIRE(yin.getResult().frequency == 0.0f);

int samplesNeeded = feedSineUntilDetection(yin, 44100.0, 440.0f, 44100);

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relock test bound of 1024 + 128 may be misleading.

After 2048 silence samples, the ring buffer is full of zeros and analyses have been running every 128 samples (all returning 0Hz). When the sine starts, the next analysis fires within 128 samples, but that analysis window is ~87% zeros and ~13% sine. It likely won't detect pitch yet.

The test allows up to 1152 samples for detection. In reality, the window needs to be predominantly signal before YIN can find a valid tau. At 440Hz (period ~100 samples at 44.1k), you'd need several full periods in the window. The bound works but doesn't really test "fast relock" -- it tests "eventual detection within one full window refill."

Consider tightening the name or the bound to more precisely describe what's being verified.

- Replace sampleCount int with windowFilled bool flag to prevent
  overflow on long sessions
- Scale windowSize and hopSize with sample rate (halfWindow =
  ceil(sampleRate/70), hopSize = ceil(sampleRate*0.003)) to
  guarantee E2 detection at all sample rates
- Add E2 (82.4 Hz) detection test
- Replace fragile hop-interval test with frequency sweep test
- Rename and tighten bounds on latency/relock tests to use
  computed window size
Copy link
Owner Author

@user1303836 user1303836 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: All issues addressed -- LGTM

All items from the first review are resolved in fcb587d:

1. sampleCount overflow -- Replaced with bool windowFilled that latches once on first buffer wrap. No unbounded counter.

2. Window/hop scale with sample rate -- halfWindow = ceil(sampleRate / 70.0) guarantees E2 (82.4Hz) detection at all sample rates. The 70Hz floor gives comfortable margin (E2 period ~535 samples at 44.1kHz vs halfWindow=631). First-result latency is a consistent ~28.6ms across sample rates. hopSize = ceil(sampleRate * 0.003) maintains ~3ms update interval.

3. E2 test added -- Confirms no regression at the low-frequency boundary.

4. Fragile hop test replaced -- Frequency sweep test (440 -> 880Hz) is a robust way to verify hop-based updates are arriving.

5. Relock test renamed -- "detects pitch change after silence" with computed bounds.

The feedSample fill logic is clean: during initial fill, hopCounter accumulates but analysis is gated by windowFilled. On the exact wrap, it latches, falls through to the hop check (which passes since hopCounter == windowSize >> hopSize), and fires the first analysis immediately. Subsequent analyses fire at hopSize intervals.

No remaining issues. Ready to merge. Phase 2 (FFT-accelerated difference function) can build directly on this.

…tion

Compute YIN difference function via FFT cross-correlation between
the first half-window and full window, replacing the O(N^2) nested
loop with 2 forward FFTs + 1 inverse FFT (O(N log N)).

Uses JUCE's performRealOnlyForwardTransform/InverseTransform with
separate buffers for the first-half and full-signal spectra, then
combines via conj(A)*B cross-spectrum before inverse transform.
Power terms still computed incrementally for the exact YIN formula.

Tests link against juce_dsp via juce_add_console_app.
Copy link
Owner Author

@user1303836 user1303836 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phase 2 Review: FFT-accelerated difference function -- LGTM

Clean implementation. The O(N^2) nested loop is correctly replaced with FFT-based cross-correlation.

Math verification

The YIN difference function d(tau) = sum (x[j] - x[j+tau])^2 decomposes into powerTerm0 + powerTermTau - 2 * r_cross(tau). The code computes r_cross via:

  1. Forward FFT of first half-window x[0..n-1] (zero-padded) -> A
  2. Forward FFT of full window x[0..2n-1] (zero-padded) -> B
  3. Cross-spectrum conj(A) * B -- conjugate multiply is correct (Re: aRe*bRe + aIm*bIm, Im: aRe*bIm - aIm*bRe)
  4. Inverse FFT -> cross-correlation values at each lag

Zero-padding is sufficient: fftSize = 2^ceil(log2(2*windowSize)) >= 4n > 3n-1 (minimum for linear cross-correlation). No circular aliasing.

Power term update is correct: sliding window powerTermTau += x[n+tau-1]^2 - x[tau-1]^2 maintains the shifted energy incrementally. Max linearBuffer index is 2n-2, within the windowSize=2n bound.

CMNDF/threshold/parabolic interpolation: unchanged from the original. Only the difference function computation was swapped.

CMake changes

Switching to juce_add_console_app is the right call -- YinPitchDetector.h now includes <juce_dsp/juce_dsp.h>, requiring JUCE module infrastructure. juce_add_console_app doesn't provide its own main(), so no conflict with Catch2. Standard JUCE_WEB_BROWSER=0/JUCE_USE_CURL=0 boilerplate to avoid transitive deps.

No issues found

This is a textbook YIN FFT optimization (matches the aubio yinfft approach). The algorithm produces identical results -- validated by existing pitch accuracy tests. No new tests needed for a pure performance swap.

Ready for Phase 3 (PitchSmoother fast-attack/slow-release).

Pitch changes larger than ~1 semitone (0.08 in log2 space) now snap
immediately, eliminating audible glides on note transitions. Small
changes within a semitone still use the configured smoothing amount
for jitter reduction on sustained notes.

Reduce max smoothing tau from 200ms to 50ms (0.2f -> 0.05f) since
the detector now updates every ~3ms instead of ~93ms.

Update existing tests for within-semitone smoothing behavior and add
tests for instant note-change lock, confidence-gap hold, and
sustained-note stability.
Copy link
Owner Author

@user1303836 user1303836 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phase 3 Review: PitchSmoother fast-attack/slow-release -- LGTM

Minimal, correct change. 3 lines added in process(), 1 constant changed in recomputeAlpha().

Code changes

Fast-attack logic: delta > 0.08f ? 1.0f : alpha -- when pitch jumps more than ~1 semitone (0.08 in log2; 1 semitone = 0.0833), the smoother snaps instantly. Within a semitone, the configured smoothing applies. This correctly handles:

  • Note transitions: instant lock, no audible glide
  • Sustained notes with jitter: smoothed out
  • Vibrato (~50 cents = 0.042 in log2): smoothed, not tracked as note changes

The confidence gating is implicitly handled -- low-confidence readings never reach the delta check due to the existing early return at the top of process().

Reduced tau: 0.2f -> 0.05f reduces max smoothing from 200ms to 50ms. Appropriate since updates now arrive every ~3ms instead of ~93ms.

Test adaptations

The two existing tests that used octave jumps (220->440Hz) were correctly identified as broken by the new behavior (delta = 1.0 >> 0.08 would trigger instant lock, defeating the smoothing test). Smart fixes:

  • "smoothing slows convergence" -> "smoothing slows convergence within a semitone": 440->443Hz (delta ~0.012), 100 iterations. Fast (0.1) converges ~36% vs slow (0.9) at ~5%. Test validates within-semitone smoothing differentiation.
  • "log-frequency domain" -> "smoothing operates in log-frequency domain": 440->445Hz (delta ~0.016), 100 iterations. Verifies partial convergence in the smoothing regime.

New tests

All three tests from the plan are present:

  1. Instant lock on note change: 440->880Hz at max smoothing, verifies output = 880 within 0.1Hz on the very next sample.
  2. Holds during confidence gaps: Alternates (440, 1.0) and (0, 0), verifies output holds at 440 during gaps.
  3. Stable output on sustained note: 1000 iterations of 440Hz, verifies convergence to within 0.01Hz.

No issues found

The implementation matches the plan exactly. The threshold at 0.08 log2 (~0.96 semitones) is a reasonable boundary between "jitter to smooth" and "new note to snap to." No remaining concerns. Ready for Phase 4 integration verification.

@user1303836 user1303836 changed the title Phase 1: Ring buffer + sliding window + remove decimation Low-latency pitch detection (phases 1-4) Feb 16, 2026
@user1303836 user1303836 merged commit 5588b1c into main Feb 16, 2026
2 checks passed
@user1303836 user1303836 deleted the feature/low-latency-pitch branch February 16, 2026 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant